跳到主要内容

09. Serialization

Beyond accessing model attributes directly via their field names (e.g. model.foobar), models can be converted, dumped, serialized, and exported in a number of ways.

Pydantic uses the terms "serialize" and "dump" interchangeably. Both refer to the process of converting a model to a dictionary or JSON-encoded string. Pydantic 将 serialize 和 dump 视为同义词,都指将 model 转换成字段或者 json 格式字符串的过程

model.model_dump(...)

This is the primary way of converting a model to a dictionary. Sub-models will be recursively converted to dictionaries.

The one exception to sub-models being converted to dictionaries is that RootModel and its subclasses will have the root field value dumped directly, without a wrapping dictionary. This is also done recursively.

from typing import Any, List, Optional
from pydantic import BaseModel, Field, Json

class BarModel(BaseModel):
whatever: int

class FooBarModel(BaseModel):
banana: Optional[float] = 1.1
foo: str = Field(serialization_alias='foo_alias')
bar: BarModel

m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})

# returns a dictionary:
print(m.model_dump())
#> {'banana': 3.14, 'foo': 'hello', 'bar': {'whatever': 123}}
print(m.model_dump(include={'foo', 'bar'}))
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(m.model_dump(exclude={'foo', 'bar'}))
#> {'banana': 3.14}
print(m.model_dump(by_alias=True))
#> {'banana': 3.14, 'foo_alias': 'hello', 'bar': {'whatever': 123}}
print(
FooBarModel(foo='hello', bar={'whatever': 123}).model_dump(
exclude_unset=True
)
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
FooBarModel(banana=1.1, foo='hello', bar={'whatever': 123}).model_dump(
exclude_defaults=True
)
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
FooBarModel(foo='hello', bar={'whatever': 123}).model_dump(
exclude_defaults=True
)
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
FooBarModel(banana=None, foo='hello', bar={'whatever': 123}).model_dump(
exclude_none=True
)
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}


class Model(BaseModel):
x: List[Json[Any]]


print(Model(x=['{"a": 1}', '[1, 2]']).model_dump())
#> {'x': [{'a': 1}, [1, 2]]}
print(Model(x=['{"a": 1}', '[1, 2]']).model_dump(round_trip=True))
#> {'x': ['{"a":1}', '[1,2]']}

model.model_dump_json(...)

The .model_dump_json() method serializes a model directly to a JSON-encoded string that is equivalent to the result produced by .model_dump().

from datetime import datetime
from pydantic import BaseModel

class BarModel(BaseModel):
whatever: int

class FooBarModel(BaseModel):
foo: datetime
bar: BarModel

m = FooBarModel(foo=datetime(2032, 6, 1, 12, 13, 14), bar={'whatever': 123})
print(m.model_dump_json())
#> {"foo":"2032-06-01T12:13:14","bar":{"whatever":123}}
print(m.model_dump_json(indent=2))
"""
{
"foo": "2032-06-01T12:13:14",
"bar": {
"whatever": 123
}
}
"""

常用参数

BaseModel - Pydantic

NameTypeDescriptionDefault
indent
int | None
model_dump_json 特有
JSON 输出的缩进。若是 None,则默认为紧凑模式None
modeLiteral['json', 'python'] | str
model_dump 特有
to_python 应该运行的模式。如果是‘ JSON’,则输出将只包含 JSON 可序列化类型。如果是‘ Python’,则输出可能包含 JSON 不可序列化的 Python 对象。python
includeIncExField(s) to include in the JSON output.None
excludeIncExField(s) to exclude from the JSON output.None
contextdict[str, Any] | None|传给 serializer 的上下文None
by_aliasboolWhether to serialize using field aliases.False
exclude_unsetbool是否过滤掉那些没有被显式赋值的字段False
exclude_defaultsbool是否过滤掉那些值等于其默认值的字段False
exclude_nonebool是否过滤掉那些值等于 None 的字段False
round_tripbool如果设置为 True,转储的值应该是非幂等类型(如 Json[T])的有效输入。
If True, dumped values should be valid as input for non-idempotent types such as Json[T].
False
warningsbool | Literal['none', 'warn', 'error']如何处理序列化时的报错。False/"none" ignores them, True/"warn" logs errors, "error" raises a PydanticSerializationError.True
serialize_as_anyboolWhether to serialize fields with duck-typing serialization behavior.False

dict(model) 与迭代

Pydantic models 还能够用 dict(models) 方式转成 dict,不过这不是一个递归的行为,so sub-models will not be converted to dictionaries.

可以使用 for field_name, field_value in model: 的方式去迭代 model

from pydantic import BaseModel


class BarModel(BaseModel):
whatever: int

class FooBarModel(BaseModel):
banana: float
foo: str
bar: BarModel

m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})

print(dict(m))
#> {'banana': 3.14, 'foo': 'hello', 'bar': BarModel(whatever=123)}
for name, value in m:
print(f'{name}: {value}')
#> banana: 3.14
#> foo: hello
#> bar: whatever=123

Note also that RootModel does get converted to a dictionary with the key 'root'.

自定义序列化行为

Pydantic provides several functional serializers to customise how a model is serialized to a dictionary or JSON.

from datetime import datetime, timedelta, timezone
from typing import Any, Dict

from pydantic import BaseModel, ConfigDict, field_serializer, model_serializer


class WithCustomEncoders(BaseModel):
model_config = ConfigDict(ser_json_timedelta='iso8601')

dt: datetime
diff: timedelta

@field_serializer('dt')
def serialize_dt(self, dt: datetime, _info):
return dt.timestamp()


m = WithCustomEncoders(
dt=datetime(2032, 6, 1, tzinfo=timezone.utc), diff=timedelta(hours=100)
)
print(m.model_dump_json())
#> {"dt":1969660800.0,"diff":"P4DT4H"}


class Model(BaseModel):
x: str

@model_serializer
def ser_model(self) -> Dict[str, Any]:
return {'x': f'serialized {self.x}'}


print(Model(x='test value').model_dump_json())
#> {"x":"serialized test value"}

A single serializer can also be called on all fields by passing the special value '*' to the @field_serializer decorator.

In addition, PlainSerializer and WrapSerializer enable you to use a function to modify the output of serialization.

Both serializers accept optional arguments including:

  • return_type specifies the return type for the function. If omitted it will be inferred from the type annotation.
  • when_used 指定此序列化器何时会被使用. 可以是 'always', 'unless-none', 'json' 或 'json-unless-none'. Defaults to 'always'.

PlainSerializer 使用一个简单的函数去改变字段序列化的输出

from typing_extensions import Annotated
from pydantic import BaseModel
from pydantic.functional_serializers import PlainSerializer

FancyInt = Annotated[
int, PlainSerializer(lambda x: f'{x:,}', return_type=str, when_used='json')
]


class MyModel(BaseModel):
x: FancyInt


print(MyModel(x=1234).model_dump())
#> {'x': 1234}

print(MyModel(x=1234).model_dump(mode='json'))
#> {'x': '1,234'}

WrapSerializer receives the raw inputs along with a handler function that applies the standard serialization logic, and can modify the resulting value before returning it as the final output of serialization.

from typing import Any

from typing_extensions import Annotated

from pydantic import BaseModel, SerializerFunctionWrapHandler
from pydantic.functional_serializers import WrapSerializer


def ser_wrap(v: Any, nxt: SerializerFunctionWrapHandler) -> str:
return f'{nxt(v + 1):,}'


FancyInt = Annotated[int, WrapSerializer(ser_wrap, when_used='json')]


class MyModel(BaseModel):
x: FancyInt


print(MyModel(x=1234).model_dump())
#> {'x': 1234}

print(MyModel(x=1234).model_dump(mode='json'))
#> {'x': '1,235'}

篡改 model_dump 的返回值类型

@model_serializer 能够篡改 .model_dump() 的返回值类型(通常是 dict[str, Any]

from pydantic import BaseModel, model_serializer

class Model(BaseModel):
x: str

@model_serializer
def ser_model(self) -> str:
return self.x

print(Model(x='not a dict').model_dump())
#> not a dict

If you want to do this and still get proper type-checking for this method, you can override .model_dump() in an if TYPE_CHECKING: block:

from typing import TYPE_CHECKING, Any

from typing_extensions import Literal

from pydantic import BaseModel, model_serializer


class Model(BaseModel):
x: str

@model_serializer
def ser_model(self) -> str:
return self.x

if TYPE_CHECKING:
# Ensure type checkers see the correct return type
def model_dump(
self,
*,
mode: Literal['json', 'python'] | str = 'python',
include: Any = None,
exclude: Any = None,
by_alias: bool = False,
exclude_unset: bool = False,
exclude_defaults: bool = False,
exclude_none: bool = False,
round_trip: bool = False,
warnings: bool = True,
) -> str:
...

This trick is actually used in RootModel for precisely this purpose.

子类的序列化

标准类型的子类

标准类型的子类会像它们的基类一样被 dump

from datetime import date, timedelta
from typing import Any, Type

from pydantic_core import core_schema

from pydantic import BaseModel, GetCoreSchemaHandler


class DayThisYear(date):
"""
Contrived example of a special type of date that
takes an int and interprets it as a day in the current year
"""

@classmethod
def __get_pydantic_core_schema__(
cls, source: Type[Any], handler: GetCoreSchemaHandler
) -> core_schema.CoreSchema:
return core_schema.no_info_after_validator_function(
cls.validate,
core_schema.int_schema(),
serialization=core_schema.format_ser_schema('%Y-%m-%d'),
)

@classmethod
def validate(cls, v: int):
return date(2023, 1, 1) + timedelta(days=v)


class FooModel(BaseModel):
date: DayThisYear


m = FooModel(date=300)
print(m.model_dump_json())
#> {"date":"2023-10-28"}

BaseModel, dataclassesTypedDict 的子类

When using fields whose annotations are themselves struct-like types (e.g., BaseModel subclasses, dataclasses, etc.), the default behavior is to serialize the attribute value as though it was an instance of the annotated type, even if it is a subclass. More specifically, only the fields from the annotated type will be included in the dumped object:

from pydantic import BaseModel

class User(BaseModel):
name: str

class UserLogin(User):
password: str

class OuterModel(BaseModel):
user: User

user = UserLogin(name='pydantic', password='hunter2')

m = OuterModel(user=user)
print(m)
#> user=UserLogin(name='pydantic', password='hunter2')
print(m.model_dump()) # note: the password field is not included
#> {'user': {'name': 'pydantic'}}

pickle.dumps(model)

Pydantic models support efficient pickling and unpickling.

import pickle

from pydantic import BaseModel


class FooBarModel(BaseModel):
a: str
b: int


m = FooBarModel(a='hello', b=123)
print(m)
#> a='hello' b=123
data = pickle.dumps(m)
print(data[:20])
#> b'\x80\x04\x95\x95\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main_'
m2 = pickle.loads(data)
print(m2)
#> a='hello' b=123

include 和 exclude 进阶

The model_dump and model_dump_json methods support include and exclude arguments which can either be sets or dictionaries. This allows nested selection of which fields to export:

from pydantic import BaseModel, SecretStr


class User(BaseModel):
id: int
username: str
password: SecretStr


class Transaction(BaseModel):
id: str
user: User
value: int


t = Transaction(
id='1234567890',
user=User(id=42, username='JohnDoe', password='hashedpassword'),
value=9876543210,
)

# using a set:
print(t.model_dump(exclude={'user', 'value'}))
#> {'id': '1234567890'}

# using a dict:
print(t.model_dump(exclude={'user': {'username', 'password'}, 'value': True}))
#> {'id': '1234567890', 'user': {'id': 42}}

print(t.model_dump(include={'id': True, 'user': {'id'}}))
#> {'id': '1234567890', 'user': {'id': 42}}

The True indicates that we want to exclude or include an entire key, just as if we included it in a set. This can be done at any depth level.

model 或字段级别的 include 和 exclude

我们还可以直接将 exclude: bool 传入 Field 中

(Field(..., exclude=True)) 的优先级比 exclude/include on model_dump / model_dump_json 更高

from pydantic import BaseModel, Field, SecretStr


class User(BaseModel):
id: int
username: str
password: SecretStr = Field(..., exclude=True)

class Transaction(BaseModel):
id: str
value: int = Field(exclude=True)


t = Transaction(
id='1234567890',
value=9876543210,
)

print(t.model_dump())
#> {'id': '1234567890'}
print(t.model_dump(include={'id': True, 'value': True})) # 优先级低,没用
#> {'id': '1234567890'}

但是捏, setting exclude on the field constructor (Field(..., exclude=True)) 的优先级旧没有 exclude_unsetexclude_none, and exclude_default parameters on model_dump and model_dump_json 来的高了

from pydantic import BaseModel, Field


class Person(BaseModel):
name: str
age: int | None = Field(None, exclude=False)


person = Person(name='Jeremy')

print(person.model_dump())
#> {'name': 'Jeremy', 'age': None}
print(person.model_dump(exclude_none=True))
#> {'name': 'Jeremy'}
print(person.model_dump(exclude_unset=True))
#> {'name': 'Jeremy'}
print(person.model_dump(exclude_defaults=True))
#> {'name': 'Jeremy'}

在序列化时传递上下文

You can pass a context object to the serialization methods which can be accessed from the info argument to decorated serializer functions. 如果你想在运行时期间动态更新序列化行为的话,这会很有用。For example, if you wanted a field to be dumped depending on a dynamically controllable set of allowed values, this could be done by passing the allowed values by context:

from pydantic import BaseModel, SerializationInfo, field_serializer


class Model(BaseModel):
text: str

@field_serializer('text')
def remove_stopwords(self, v: str, info: SerializationInfo):
context = info.context
if context:
stopwords = context.get('stopwords', set())
v = ' '.join(w for w in v.split() if w.lower() not in stopwords)
return v


model = Model.model_construct(**{'text': 'This is an example document'})
print(model.model_dump()) # no context
#> {'text': 'This is an example document'}
print(model.model_dump(context={'stopwords': ['this', 'is', 'an']}))
#> {'text': 'example document'}
print(model.model_dump(context={'stopwords': ['document']}))
#> {'text': 'This is an example'}

model_copy(...)

model_copy() allows models to be duplicated (with optional updates), which is particularly useful when working with frozen models.

from pydantic import BaseModel


class BarModel(BaseModel):
whatever: int


class FooBarModel(BaseModel):
banana: float
foo: str
bar: BarModel


m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})

print(m.model_copy(update={'banana': 0}))
#> banana=0 foo='hello' bar=BarModel(whatever=123)
print(id(m.bar) == id(m.model_copy().bar))
#> True
# normal copy gives the same object reference for bar
print(id(m.bar) == id(m.model_copy(deep=True).bar))
#> False
# deep copy gives a new object reference for `bar`